Load packages
CITE-seq dataset
In this tutorial, we use a public CITE-seq dataset to illustrate Joint analysis using LinQ-View. Data could be download from NCBI: RNA and ADT.
More details about this dataset can be found on Seurat Website
Step 1 Load data
# Load in the RNA UMI matrix
cbmc.rna <- as.sparse(x = read.csv(file = "../Data/citeseq/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz",
sep = ",", header = TRUE, row.names = 1))
cbmc.rna <- CollapseSpeciesExpressionMatrix(object = cbmc.rna)
# Load in the ADT UMI matrix
cbmc.adt <- as.sparse(x = read.csv(file = "../Data/citeseq/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz",
sep = ",", header = TRUE, row.names = 1))
cbmc.adt <- cbmc.adt[setdiff(x = rownames(x = cbmc.adt), y = c("CCR5", "CCR7",
"CD10")), ]Step 2 Create Seurat object
Step 3 Pre-process
Users are allowed to use either original Seurat functions or our functions for pre-process steps
1) Filter out unwanted cells (optional)
for this dataset, we don’t need to filter out unwanted cells
2) Remove unwanted genes (optional)
for this dataset, we don’t need to filter out unwanted genes
3) Normalization
data Normalization for both ADT (CLR) and RNA (log)
4) Indentify HVGs for RNA data
Call seurat function to identify highly variable genes (HVG) for RNA data
Step 4 Linear dimension reduction (PCA)
directly call Seurat function for linear dimension reduction (PCA)
Step 5 Determine number of PCs
call Seurat function JackStraw to determine number of PCs
cbmc <- JackStraw(cbmc, num.replicate = 100)
cbmc <- ScoreJackStraw(cbmc, dims = 1:20)
JackStrawPlot(cbmc, dims = 1:20)Step 6 Distance calculation and joint distance calculation
calculate cell-cell distances for RNA, ADT and joint. number of PC was set to 20 by default.
Step 7 Non-linear dimension reduction (UMAP and t-SNE)
run UMAP as Non-linear dimension reduction for RNA, ADT and joint analysis.
Step 8 Clustering
cbmc <- clusteringFromDistance(object = cbmc, assay = "All", resolution = c(0.9,
0.9, 0.9))
# contribution of two modalities
distHeatMap(object = cbmc)Step 9 Visualization ADT vs RNA vs Joint
1) Cell clusters
plots <- generateGridDimPlot(cbmc, legend = FALSE, darkTheme = FALSE)
listPlot(object = plots, align = "h")
###### user also can only plot some of those plots by index, figure ident or
###### figure map info listPlot(object = plots, fig.ident = 'RNA')
###### listPlot(object = plots, fig.ident = 'RNA', fig.map = 'RNA') user can use
###### plotInfo() function to get index, figure ident and figure map information,
###### then plot figures by index
plotInfo(plots)
# listPlot(object = plots, fig.id = 1)As indicated by red circle, joint analysis identified two distinct NK cell subsets: CD8+ NK and CD8- NK. These two NK subsets also can be identified by using ADT information only. Heatmap below shows the distinct curface protein pattern of these two NK subsets (joint cluster 5 and 6). These two subsets can not be distinguished using RNA only because they have identical transcriptional expression.
As indicated by blue circle, joint analysis identified two distinct CD4 T cell subsets: Naive CD4T and Memory CD4 T. These two CD4 T cell subsets also can be identified by using RNA information only, but can not be distinguished using ADT only because they have similar cell curface protein pattern.
2) Heat maps
Heatmap for joint clusters
# Heatmap for joint clusters
heatMapPlot(object = cbmc, group.by = "jointClusterID", height.rel = 1, adt.label = TRUE)Heatmap for RNA clusters
# Heatmap for RNA clusters
heatMapPlot(object = cbmc, group.by = "rnaClusterID", height.rel = 1, adt.label = TRUE)Heatmap for ADT clusters
# Heatmap for ADT clusters
heatMapPlot(object = cbmc, group.by = "adtClusterID", height.rel = 1, adt.label = TRUE)